Create a jobserver with N tokens, not N-1 #7344

alexcrichton · 2019-09-09T15:50:01Z

I recently added jobserver support to the cc crate and ended up
running afoul of a jobserver quirk on Windows. Due to how it's
implemented, on Windows you can't actually add more than the intial
number of tokens to the jobserver (it uses an IPC semaphore). On Unix,
however, you can since you're just writing bytes into a pipe.

In cc, however, I found it convenient to control parallelism by simply
releasing a token before the parallel loop, then reacquiring the token
after the loop. That way the loop just has to acquire a token for each
job it wants to spawn and then release it when the job finishes. This is
a bit simpler than trying to juggle the "implicit token" all over the
place as well as coordinating its use. It's technically invalid because
it allows a brief moment of N+1 parallelism since we release a token
and then do a bit of work to acquire a new token, but that's hopefully
not really the end of the world.

In any case this commit updates Cargo's creation of a jobserver to create
it with N tokens instead of N-1. The same semantics are preserved
where Cargo then immediately acquires one of the tokens, but the
difference is that this "implicit token" can be released back to the
jobserver pool, unlike before.

I recently added `jobserver` support to the `cc` crate and ended up running afoul of a `jobserver` quirk on Windows. Due to how it's implemented, on Windows you can't actually add more than the intial number of tokens to the jobserver (it uses an IPC semaphore). On Unix, however, you can since you're just writing bytes into a pipe. In `cc`, however, I found it convenient to control parallelism by simply releasing a token before the parallel loop, then reacquiring the token after the loop. That way the loop just has to acquire a token for each job it wants to spawn and then release it when the job finishes. This is a bit simpler than trying to juggle the "implicit token" all over the place as well as coordinating its use. It's technically invalid because it allows a brief moment of `N+1` parallelism since we release a token and then do a bit of work to acquire a new token, but that's hopefully not really the end of the world. In any case this commit updates Cargo's creation of a jobserver to create it with `N` tokens instead of `N-1`. The same semantics are preserved where Cargo then immediately acquires one of the tokens, but the difference is that this "implicit token" can be released back to the jobserver pool, unlike before.

rust-highfive · 2019-09-09T15:50:04Z

r? @ehuss

(rust_highfive has picked a reviewer for you, use r? to override)

ehuss · 2019-09-11T01:44:00Z

I'm trying to better understand the motivation for this change. I think I see how it allows cc to release its implicit token, but it looks like the cc crate already handles the case where it fails to release it. If it is already being handled, why change cargo?

I think this seems ok, I can't think of any problems, but this is always so tricky to get right.

Also, I assume from an error handling standpoint in the cc crate, that if it fails while running that it doesn't really matter that it doesn't re-acquire the token since cargo will be shutting down anyways?

I'm a little concerned that there's a window after all cc threads have finished that cargo fires off another job, and cc blocks at the end trying to reacquire its implicit token. It could block for a long time just so it can cleanly exit. I guess that doesn't matter since its not holding any job slots. Is that a correct understanding? And that this PR ensure that implicit release hack works and it doesn't need to reacquire at the end?

Another question: Should I update cc on rust-lang/rust? I considered using parallel in the past, but building rayon negated any benefits. It looks like it is still using an old version with rayon.

alexcrichton · 2019-09-11T13:28:54Z

If it is already being handled, why change cargo?

You can see from the release history of cc that I had a few panicked releases in quick succession this past Saturday, realizing (roughly in order)

The build script has an implicit token so if we acquire a separate token for each piece of spawned work, we deadlock.
Releasing the token at the beginning is apparently broken on Windows
There's some typos, but then also cc just ignores errors to release the implicit token.

This is "handled" in cc but it's technically incorrect. This is only a problem on Windows where the actual cap of the jobserver is enforced (because it's a semaphore), but if run a build with -j2 and two build scripts start working then one will fail to release its token while the other will succeed. (I think at most one script will fail to release its token). This means that there's just one less token to acquire, so the build would actually run with 1 parallelism.

TBH I haven't though too too carefully about this, I was sort of hoping bugs in the wild would guide it and figured the bugs wouldn't be that bad. It's also something I assumed I could pretty easily fix in Cargo come Monday :)

this is always so tricky to get right.

you're not kidding

Is that a correct understanding? And that this PR ensure that implicit release hack works and it doesn't need to reacquire at the end?

I believe this is correct yeah. This is why the way cc is obeying the jobserver protocol is a bit wrong, processes should never literally release their own token but rather just juggle the implicit token between spawned work and themselves trying to spawn more work. This means that there can be a bunch of build script processes hogging memory (one of the biggest reasons for -j) which won't go away until other jobs complete. In the case of build scripts it's probably not that big of a deal though.

This PR basically ensures that the initial release for the build script actually works, because currently it doesn't on Windows. Most initial releases work but if you run -j8 and a build script is the only process running, then the jobserver actually has 7 tokens max and no tokens were ever read (the implicit one was passed around) so the initial release will fail. This PR, however, makes it so the 8th token was actually acquired at the beginning, so the build script can indeed successfully release it.

I was also wondering if it's worth reacquiring the token, but the cc crate does actually run ar which can be somewhat expensive so there's at least a tiny bit more than a clean exit. (that and cc is just a library, so a build script itself may actually do much more).

Should I update cc on rust-lang/rust?

Yeah I think it should be safe to update, this probably only affects Cargo's dependency graph and that's already tested on CI with the latest libgit2 and such!

ehuss · 2019-09-11T21:57:23Z

Ok! 😄

@bors r+

bors · 2019-09-11T21:57:24Z

📌 Commit cab8640 has been approved by ehuss

bors · 2019-09-11T21:57:31Z

⌛ Testing commit cab8640 with merge a0be578...

Create a jobserver with N tokens, not N-1 I recently added `jobserver` support to the `cc` crate and ended up running afoul of a `jobserver` quirk on Windows. Due to how it's implemented, on Windows you can't actually add more than the intial number of tokens to the jobserver (it uses an IPC semaphore). On Unix, however, you can since you're just writing bytes into a pipe. In `cc`, however, I found it convenient to control parallelism by simply releasing a token before the parallel loop, then reacquiring the token after the loop. That way the loop just has to acquire a token for each job it wants to spawn and then release it when the job finishes. This is a bit simpler than trying to juggle the "implicit token" all over the place as well as coordinating its use. It's technically invalid because it allows a brief moment of `N+1` parallelism since we release a token and then do a bit of work to acquire a new token, but that's hopefully not really the end of the world. In any case this commit updates Cargo's creation of a jobserver to create it with `N` tokens instead of `N-1`. The same semantics are preserved where Cargo then immediately acquires one of the tokens, but the difference is that this "implicit token" can be released back to the jobserver pool, unlike before.

bors · 2019-09-11T22:21:50Z

☀️ Test successful - checks-azure
Approved by: ehuss
Pushing a0be578 to master...

rust-highfive assigned ehuss Sep 9, 2019

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Sep 9, 2019

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 11, 2019

bors merged commit cab8640 into rust-lang:master Sep 11, 2019

alexcrichton deleted the jobserver branch September 16, 2019 18:24

ehuss mentioned this pull request Dec 10, 2019

cargo seems to start several build jobs with -j1 (but doesn't?) #7689

Open

ehuss mentioned this pull request Jan 9, 2020

cargo doesn't respect the maximum number of jobs #7772

Closed

ehuss added this to the 1.39.0 milestone Feb 6, 2022

tgnottingham mentioned this pull request Aug 19, 2023

Issues with releasing and reacquiring implicit jobserver token rust-lang/cc-rs#858

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create a jobserver with N tokens, not N-1 #7344

Create a jobserver with N tokens, not N-1 #7344

alexcrichton commented Sep 9, 2019

rust-highfive commented Sep 9, 2019

ehuss commented Sep 11, 2019

alexcrichton commented Sep 11, 2019

ehuss commented Sep 11, 2019

bors commented Sep 11, 2019

bors commented Sep 11, 2019

bors commented Sep 11, 2019

Create a jobserver with N tokens, not N-1 #7344

Create a jobserver with N tokens, not N-1 #7344

Conversation

alexcrichton commented Sep 9, 2019

rust-highfive commented Sep 9, 2019

ehuss commented Sep 11, 2019

alexcrichton commented Sep 11, 2019

ehuss commented Sep 11, 2019

bors commented Sep 11, 2019

bors commented Sep 11, 2019

bors commented Sep 11, 2019